library(tidyverse)
library(readxl)
path = "Excel/700-799/709/709 Filter Out Repeats.xlsx"
input = read_excel(path, range = "A2:B10")
test = read_excel(path, range = "C2:D8")
result = input %>%
filter(!(n_distinct(str_extract_all(Data2, "[A-Z]")[[1]]) == 1 &
str_length(Data2) != 1 &
n() != 1), .by = Data2) %>%
select(Data1, Data2)
all.equal(result, test)
# TRUEExcel BI - Excel Challenge 709
excel-challenges
excel-formulas
🔰 Answer Expected Data1 Data2 E A, B A, A, A B, C D, D Filter out (remove) those rows where all values get repeated in the cell in Data2 column and entire cell gets repeated in Data…

Challenge Description
🔰 Answer Expected Data1 Data2 E A, B A, A, A B, C D, D Filter out (remove) those rows where all values get repeated in the cell in Data2 column and entire cell gets repeated in Data2 column. Ex. A, A, A - All values (A here) get repeated in a cell and this entire cell gets repeated again in Data2 column.
Solutions
- Logic: Read the workbook ranges needed for the challenge; Parse the packed text or string structure; Aggregate or rank the data at the required grouping level.
- Strengths: The solution stays close to the text pattern itself, which makes the extraction logic easy to audit.
- Areas for Improvement: The solution assumes the workbook layout and selected ranges remain stable, so any structural change in the sheet would require small adjustments.
- Gem: A small number of well-targeted text patterns does most of the heavy lifting.
import pandas as pd
import re
path = "700-799/709/709 Filter Out Repeats.xlsx"
input = pd.read_excel(path, usecols="A:B", skiprows=1, nrows=9)
test = pd.read_excel(path, usecols="C:D", skiprows=1, nrows=6).rename(columns=lambda x: re.sub(r"\.1$", "", x))
def extract_distinct_uppercase(s):
return len(set(re.findall(r"[A-Z]", s)))
filtered = input.groupby("Data2").filter(
lambda group: not (
extract_distinct_uppercase(group.name) == 1 and
len(group.name) != 1 and
len(group) == 2
)
).reset_index(drop=True)
result = filtered[["Data1", "Data2"]]
print(result.equals(test)) # TrueThe Python version follows the same grouped logic and keeps the transformation explicit in a dataframe pipeline.
Difficulty Level
Medium
The individual steps are manageable, but the correct transformation pattern is not obvious from the raw data.